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_ Abstract 

We propose a simple model of pathologic microsatellite expansion, and describe an 
inherent self-repairing mechanism working against expansion. We prove that if the 
^ ■ probabilities of elementary expansions and contractions are equal, microsatellite ex- 

OO I pansions are always self-repairing. If these probabilities are different, self-reparation 

does not work. Mosaicism, anticipation and reverse mutation cases are discussed in 
T^lj- ■ the framework of the model. We explain these phenomena and provide some theo- 

, retical evidence for their properties, for example the rarity of reverse mutations. 

o 
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1 Introduction 



Pathologic microsatellite expansion is a phenomenon causing several severe 
dise ases lik e Fragile X, Huntington disease, Myotonic Dystrophy and oth- 



ers (jHarperl . l200ll : iPearsod . l2003l : iPineiro et al.l . l2003l : iLibby et al.l . l2003l : iPearson et al 



20051 ). There are places in a DNA molecule where nucleotide sequences are re- 



peated several times. The number of such repeats (satellites) is usually stable 
during normal replication. However, sometimes a mutation occurs, and the 
mutated DNA has more (expansion) or less (contraction) repeats than its an- 
cestor. Noiin^ly_tliemutation rates are about 10~^ . . . 10"'* per generation per 
locus (lEllegrenl . l2000bl ). However in the case of diseases mentioned above, the 
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expansions occur much faster, at a rate of hundreds or thousands per locus per 
generation. We will call this phenomenon pathologic expansion to distinguish 
it from the much slower "normal" expansion. We are interested in the case 
where the number of nucleotides in the repeating sequence is small, e.g. when 
the repeated sequences are triplets. In this case the phenomenon is usually 
called microsatellite expansion. Sometimes, as in the case of Myotonic Dys- 
trophy Type 1 (DM1, OMIM #160900), the number of repeated triplets might 
actually reach thousands. For some diseases this expansion occurs in a coding 
part of DNA, for some in a non-coding one, but it is always a multi-system 
disease with multiple symptoms. 

There are several notable features of pathologic microsatellite expansion, com- 
mon to most diseases associated with it: 



Mosaicism: For most diseases the number of repeats is not the same in all 
cells. Rather, it has a wide distribution of possible values. A notable excep- 
tion is Huntington disease (HD, OMIM #143100), whe re the mosaicism is 
not a s prominent as for other expansion-related diseases (iHarper and Joned . 



2OO2I ) . The reason for becomes more clear after we discuss the model. We 



return to this disease in Section m 
Anticipation: For some diseases a relatively small increase in the number of 
repeats does not lead to symptoms. However, the stability of the repeated 
sequence is lower than the stability of the non-affected DNA, and the chil- 
dren of affected parents might show symptoms, sometimes severe (IHarper . 



20011). 



Reverse Mutation: Sometimes the children of symptomatic patients have 
the norm al number of repeats . This is a rare, but still observable phe- 



nomenon (IBrunner et al.l . Il993l : iMonckton et al.l . Il995l ) . 



A theory of microsatellite expansion must naturally explain these phenomena. 



One of the most common explanations of microsatellite expansion is the forma- 
tion of hairpin s either during rephcation (jCleary et al.l . l2002l : iMirkin and Smirnoval . 



2OO2I : Pearsonl. l2003l: lYang et al.l. |2003| ) or during DNA repair after transcrip- 



tion (IGomes-Pereira et al.l . |2004J ). In both these cases the hairpin formation 



can cause either expansion or contraction of DNA. A recent rev iew comparing 
these explanations can be found in, e.g. (jPearson et al.l . l2005l ). In this paper 



we will not try to distinguish between these mechanisms; the proposed model 
describes both. Therefore we will understand by cell events either cell divisions 
in the first model or cell repair events in the second one. 



This model is attractive because it can explain a number of features of mi- 
crosatellite expansion. In this model hairpins form during some, but not all, 
cell events. Therefore it is a random, rather than a deterministic process. Thus 
different cells have different number of expansions and contractions in their 
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histories. This explains mosaicism, i.e. broad distribution of the number of 
repeats in different cells within the same tissue. 



Gametes in this model might have different numbers of repeats. If the number 
of repeats turns out to be small, the child of an affected parent will be not 
affected. This explains reverse mutation. The calculations below show that 
this phenomenon is indeed very rare. 

To explain anticipation, we can assume that the probability of expansion grows 
with the number of repeats in the DNA molecule. If the number of repeats 
in an asymptomatic patient increases, the probability to have a symptomatic 
child also increases. 

One of the ways to verify these speculations is to try to make possible con- 
clusions from the model, and to check whether these conclusions agree with 
the observed picture of microsatellite expansion. If they do, our confidence in 
the model grows, if they do not, then the model is wrong. This paper takes 
the qualitative model described above for granted and tries to formalize it in 
the form of differential equations for the observed distribution of repeats. We 
solve these equations and show a qualitative agreement with the observations. 

It is interesting to compare the pathologic microsatellite expansion due to fast 
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(EUeereni. i2000b)) as a wav to infer data on evolution 



process. This approach has a promise of higher time resolution than other 
methods because "slow" microsatellite mutations are still several orders of 



magnitude faster than most other mutations (lEUegrenl , l2000bl ) . Such full com- 
parison of "normal" and pathologic expansions is beyond this paper, but one 
might express a hope that in the future it will help to understand both better. 
For example, while during "normal" mutations the expansions of the repeat 
sequence are thought to be more probable than the contractions, there seems 
to exi st some mechanisrn that limits an uncontro l led expansion of microsatel- 
lites ( Garza et al. , 19951 : Amos and Rubinsztein , 1996 ; Harr and Schlotterer , 
Xu et all . l2000f ). Apparently such mechanism is absent or too week for 



2000 



pathologic microsatellite expansions. A comparative study might therefore 
help to elucidate details and effects of this mechanism. On the other hand, 
one must be very cautious in the application of data and conclusions from the 
study of the "normal" expansions to the pathologic ones and vice versa. 



There are many theoretical works describing "normal" microsatellit e expan 



sion using both analytical methods and computer simulations (see e.g. (iTachida and lizukal . 
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quences with relatively small number of repeats. Our situation is rather op- 
posite: we are interested in long sequences and fast change. Therefore our 
formalism and results are quite different from theirs. 



2 Model 



First, let us discuss how one hairpin is formed. Consider a hairpin with 
h/2 repeats. If L is the number of repeats i n a K uhn segment of the poly- 
mer (Ide Gennesl . Il979l : iPainter and Colemanl . 119971 ). then we gain h/{2lk) de- 
grees of freedom. The corresponding free energy loss is kTh/l^, where k is 
Boltzmann constant, T is temperature. On the other hand the energy gain is 
6Eh, where 6E is the energy gain per repeat. Since both these contributions 
are proportional to h, the total free energy change is also proportional to h: 



AF = Ch, 



C = const 



(1) 



The value of the constant C in this equation depends on the relation between 
SE and kT /Ik. If C < 0, the formation of hairpins causes a decrease of free 
energy, and the longer are the hairpins, the better. This would lead to a fast 
de-stabilization of the number of repeats. Since this does not happen, we can 
conclude that C > 0. This means that the formation of hairpins is not encour- 
aged by thermodynamics, and the formation of longer hairpins is suppressed 
with the probability proportional to exp(— C/i). Since the probability expo- 
nentially decreases with /;,, only the shortest possible hairpins are formed. The 
minimal size of a hairpin depends on the flexibility of the molecule. It stands 
to reason to assume it of the order of one-two Kuhn segments. Therefore the 
microsatellite de-stabilization cannot start until the DNA has at least several 
Ik repeats. 



These thermodynamic considerations explain anticipation: it is necessary to 
have at least several Kuhn segments in the microsatellite repeats interval to 
start the mechanism of de-stabilization. Of course czs-elements might sub- 
tly influence hairpin formation at the early stages of de-stabilization. There- 
fore t hey play an important role in the tra nsition from anticipation to dis- 



ease (IBrock et al.l . Il999l : ICleary et al.l . |2002| ). It is interesting that a certain 



threshold num ber of repea t s is necessary for "normal " expa nsions too, at least 



in some cases (ISibly et al.l . l200ll . l2003l : IShinde et al.l . |2003|) . 



Let us now discuss a strand of DNA having x repeats after i cell events. The 
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next event can have one of three possible outcomes: 

(1) No expansion or contraction occurred. 

(2) There was an expansion of length n. Let Qins{xin) be the probability of 
this event. 

(3) There was a contraction of length n. Let Qde\{x,n) be the probability of 
this event. 

Let Pi{x) be the probability that the strand has exactly x repeats. Then it 
is easy to write the master equation, describing the transition from the step 
number i to the step number i + 1: 

P,+i(x) = P,(x) + 

^{Pi{x - n)Qins{x -n,n) + Pi{x + n)Qdei{x + n,n)- 

n=l ^ 

Pi{x)Qins{x, n) - Pi{x)QAc\{x, n) ) (2) 



Equation ([2]) might be simplified if we make the following assumption, based 
on the thermodynamic considerations in the beginning of this Section. Namely, 
we assume that the constant C in equation ([1]) is large enough, so expansions 
and contractions are in fact rare. If nmin is the minimal hairpin length allowed 
by chain flexibility, then the only events to be considered in the sum ([2]) are 
expansions and contractions of length rimin- Now we must estimate the prob- 
abilities of one expansion or contraction as functions of repeats number x. If 
we consider for guidance "slow" mutations in the non pathologic regime (see 
Introduction), we see that there is a considerable controversy in the litera- 
ture about the dependence of mutation rate on x. Some author s report expo- 
nentia l growth ( Brinkmann et al. , 1998 : Whittaker et al. . 20031: Lai and Sun . 



2003ar). wh i le other rep o rt much weaker lin e ar rel ationship (jKruglyak et al. . 



19981. I2OOOI: ISiblv et all. l200l| : IShinde et all l2003h or more complex depen- 



dence ( ICalabrese and Durrettl . |2003. : .Sibly et al.l . l2003l ). Moreover, cis-factors, 
obviously, shou l d also influence the mutation rates. We can only agree with 



Primmer et al.l (119981 ): "These observations demonstrate that the mutation 
process of microsatellites may be more complex than previously thought". 
Fortunately the situation for relatively large x can be simplified. Indeed, if 
the number of repeats is sufficiently large, we can divide the stretch of mi- 
crosatellites into parts, each enough to assume that a mutation or a repair 
error in one part does not affect the other ones. Only two end parts depend 
on the czs-factors. If each part mutates independently, the overall mutation 
rate should be proportional to the number of parts. This simple consideration 
suggests that the mutation rates at least for large x should be linear in x. 
Moreover, they must go to zero as the number of repeats goes to nmin. If we 
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set the origin of x axis to nmin, we get simply: 

Qins{x, n) = qins{n)x, Qdc\{x, n) = qde\{n)x 
With these assumptions equation can be rewritten as: 



(3) 



P,+i{x) - P,{x) 



Q'ins('^min) ^^il"^ ^min)(3' ^min) -Pj ('^)'^^ ~l~ 

) - Pi{x)x ) (4) 



The next step is the transition from the discrete representation (jl]) to a con- 
tinuous one. We will "smooth" the variables i and x. In order to do this we 
will measure "time" t in the number of events and consider P to be a function 
of a continuous variables t and x, so P{t, x) dx is the probability to have the 
number of repeats between x and x + dx a.t the time t. Then we can rewrite 
equation (jl]) in the continuous form as: 



dP{t,x) 



d[xP{t,x)) ^d^{xP{t,x] 



dt dx dx"^ 

C ^'?ins (^min) Q'dol (^min)^ ^min 



(5) 



D 



Note that the quantity 



J = cxP - D 



djxP) 
dx 



(6) 



has the meaning of flow of probability through the point x at the time t. By 
the way, this means that equation ([5]) has the simple meaning of continuity 
equation dP/ dt + div J = 0. 

If the number of repeats in the zygote is Xq, then equation (jS]) has the following 
initial condition: 

P(0, X) = 6{X - X q) (7) 

where 6 is Dirac's delta-function fe.g. lBartonl . ll989h . 



We will see that P{t, +0) remains finite, so 

lim xP{t, x) = 

As shown below, the flow ([6]) remains non-zero at x 
integral 



fmit) 



+0 



P(t, x) dx 



-0. Therefore the 

(9) 
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is not conserved. This integral represents the fraction of "mutant" cells, i.e. 
cells with the number of repeats large enough to form hairpins and therefore 
to be described by equation The discontinuity of the function P{t,x) at 
X — makes this integral less than 1. Its complement to 1 is the fraction of 
cells, which can no longer form hairpins and are "stuck at zero" : 

frit) = I - fm{t) (10) 

We will call such cells "repaired" cells. The increase of fr over time represents 
a self-reparation effect. 

We introduce the parameter 

7 = ^ (11) 

This parameter reflects the difference between the probabilities of expansion 
and contraction. The case of 7 = corresponds to the situation when expan- 
sions and contractions occur with equal probabilities. If expansions are more 
probable, then 7 > 0. Note that the value of 7 depends on the progenitor 
number of repeats Xq. The greater is Xq, the larger is 7. We will see that this 
parameter critically affects the microsatellite instability. 

We will measure time in the units of xq/ i.e. we will introduce a dimension- 
less variable 

r = tD/xQ (12) 

As shown in Appendix, at the reasonable values for the parameters one di- 
mensionless unit of time corresponds to about 25 cell events. 



3 Results And Discussion 



The solution for equation ([5]) is obtained in Appendix |X1 Here we discuss the 
properties of the solution and predictions of the model. 

First we consider the fraction of repaired cells (see Appendix Rl): 

/.(^)=exp(-^-^) (13) 

The number of "repaired" cells increases with the time r. The speed of this 
increase and the limit fraction at r — > 00 depend on the parameter 7. 

In the special case 7 = 0, i.e. when expansions and contractions happen with 
the same probability, equation (fT3|l becomes /r(T) = exp ( — 1/r). In this case 
fr goes to 1 as r ^ 00. This means that all cells eventually become repaired. 
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In the case 7 > the hmit at r 00 is exp(— 7). For large enough 7 
the fraction of repaired cells is small, but nevertheless not zero. This case 
corresponds to the observed clinical picture. 

Now we can explain the phenomenon of reverse mutation. If a parent is af- 
fected, the gamete might carry DNA either from the repaired population 
or from unrepaired, mutant population. In the first case a reverse mutation 
occurs. Therefore the probability of reverse mutation is exp(— 7). It seems 
that reverse mutations are very rare events. In the case of Myotonic Dystro- 



phy (IBrunner et al.l . Il993l ) the probability of reverse mutation is very small. 
We will rather arbitrarily estimate it as 1 : 1000; a more frequent occurrence 
would be observed more often, and a more rare one would not be observed at 
all. This gives the following estimate for 7: 

7 7 (14) 



Plots of /r(T) for several values of 7 are shown on Figure [H It can be seen 
from these plots that the number of repaired cells quickly reaches the limit 
value. This justifies the assumption that the fraction of repaired cells in the 
gametes is equal to the limiting value. 

Let us now return to the solution of equation It can be expressed through 
the mean number of repeats m and standard deviation a (see Appendix 



P{x,t) 



exp 



2m2 



1 + 



X 



m 



h 



(15) 



where Ii is the modified Bessel function ( lAbramowitz and Stegunl . 119721 . § 9). 
The mean number of repeats and standard deviation depend on time as 



m = Xq exp(7r), cr 



1 — exp(— 7r) 

7 



1/2 



a;oexp(7r) 



(16) 



Also interesting are skewness and kurtosis of the distribution. They are 



S 



1 — exp(— 7r) 
, 2^ , 



1/2 



K = 6 



exp(— 7rj 

7 



(17) 



At early stage of instability growth {•yr <C 1) these equations describe a sharp 
distribution centered around m. The ratio of the distribution width 2a to the 
mean size m is small (about 2(2r)^/^). 

However, at later stages {'jt ^ 1) the picture is completely different. At these 
stages the curve is very wide. The ratio of the width to the the mean size is 



8 



fr 



1 - 




Fig. 1. Fraction of Repaired Cells As Function of Time 



at these stages 2a /m = 2(2 /'jY^'^ ^ 1. This large width explains the observed 
mosaicism. 

The transition from the first regime to the second one depends on 7, and thus 
on the progenitor number of repeats Xq. The larger is Xq, the earlier is the 
transition to the second regime, i.e. the regime of developed instability. 

The distribution has positive skewness and kurtosis. They are about zero at 
early stages, and tend to 3/(27)^/^ ^ 0.8 and 6/7 ~ 0.9 correspondingly. 

Typical plots distribution of repeat lengths are shown on Figures [2] and [3l 
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Fig. 2. Repeat Number Distribution For Unrepaired Cells, small 7 
4 Conclusions 



We have shown that a very simple model of pathologic microsatellite expansion 
can qualitatively explain the observed phenomena of anticipation, mosaicism 
and spontaneous recovery. This model considers expansion or contraction of 
repeats as a random process with the probability of expansion and contraction 
related to the probability of hairpin formation. A mathematical model based 
on this picture is able to predict the shape of the distribution of the number 
of repeats after many divisions. 

This model predicts a natural "reparation process" leading to reverse muta- 
tion. In the case when the probabilities of expansion and contraction are equal, 
this process eventually heals the mutation. Therefore mutation survives only if 
the probability of expansion exceeds the probability of contraction. The frac- 
tion of repaired cells in the long run depends on this difference in probabilities. 

We implicitly assumed that only "young" cells, the ones belonging to the 
latest generation, are used in the measurements of the number of repeats. If 
this assumption is not satisfied, the observed length distribution should be 
obtained by summation of the results over generations of cells. However, if 
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Fig. 3. Repeat Number Distribution For Unrepaired Cells, large 7 

the "older" cells die due to apoptosis, this effect is small. For example, if the 
apoptosis for blood cells occurs after 25-40 mitoses, as it is usually thought, 
then the effect is indeed negligible for blood samples. 

Another interesting question is the possibility of selection: the rate of cell 
survival and multiplication might depend on the level of the mutation of mi- 
crosatellite expansion. This will change the rates of expansion and contraction 
for the cell population as a whole. 

A notable exception from the general picture of trinucleotide expansion dis- 
eases is Huntington disease. Mosaicism for this diseas e is not as prominent as 
for other dynamic mutations (iHarper and Joned . 120021 ) . However, a closer look 
shows that this example actually does not contradict our model. The number 
of repeats for HD is rather small (several dozens). It seems that the mutation 
in this case is caused by a small number of relatively large expansions, rather 
than a large number of small expansions, as assumed in this paper. 



It would be interesting to extend the analysis of this paper to quantitative 
comparison with the experimental data. This will be done in subsequent works. 



A further comparison of the fast pathologic mutations and slow "normal" ones 



11 



seems also to be promising. 
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A Solution Of Master Equation 



In this Appendix we provide the solution of master equation (j5]). This equation 
is easier to solve in the following dimensionless variables: 

^ = x/xo, T = tD/xQ, p{t,^) = xoP{x,t) (A.l) 

Let us roughly estimate these parameters. Taking values approximating the 
known data about DM1, we get 

xo ^ 10^ gins ^ gdci ~ 10"^ n„,in ^ 20 (A.2) 

so 

^ ^ lO^^^,^ ^ ^ 0.04t (A.3) 

In other words, one dimensionless unit of r corresponds approximately to 25 
cell events, while one dimensionless unit of ^ corresponds approximately to 
100 repeats. 

Let us introduce the function 

vir,0 = ^p{T,0 (A.4) 
Then equations ([5]) can be rewritten as 

dv dv d'^v ^ . 

We use Laplace transform with respect to ^ : 

roc 

V(r,s) = / e-'^v{T,x)d^ (A.6) 

^0 
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For t he Laplace transform of the derivatives we have (see (lAbramowitz and Steguru . 
19721 . § 29.2.5)) 



- = sV-v{t,+0), — 



= s^V - sv{t,+0) -V{t) 



where = means Laplace transform, and 



Vir) 



(A.7) 



(A.8) 



Mult iplication by — ^ corresponds to differentiation by s (see (lAbramowitz and Stegunl . 
I972I . § 29.2.9)). Due to the border condition ([8]), the values of v{t,+0) and 
V{t) in equation flA.7l) do not depend on ^. Therefore we can rewrite equa- 
tion flA.SP in Laplace space as 



or OS ^ 

with the initial condition from equation ([7]) 

V(0, s) = exp(— s 







(A.9) 



(A.IO) 



Let 

U(t,s) = (s^ -7s)V(r,s) 
Then we can rewrite equations (IA.9P and flA.lOp as 



dU 

W(0,s 



/ 2 ^dU 
57 + ^^ 







7s) exp(— s) 



(A.ll) 

(A.12) 
(A. 13) 



The solution of differential equation (lA.lip is 



(A.14) 



Taking into account the initial condition ( ]A.13p and returning to the function 
V, we get 



(s(l -e-T^) +e-')'^7)' 



exp 



57 



s(l - e-T^) + e-T^7 



(A.15) 



The r everse Laplace transform of this expression is (see (lAbramowitz and Stegunl . 
I972I . § 29.3.81)) 



7(e-^^0^^^ 
1 - e-T^ 



exp 



1 - e-T^ 



h 



• 27 
. 1 - e-T^ 



1/2] 



(A.16) 
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which provides the solution of the master equation. 



Using the asymptotic (lAbramowitz and Stegunl . Il972l . § 9.6.7) 

h{z) ^ z/2, z < 1 

we see that at ,^ ^ 



p(r, +0) 



7^ e-T^ 

'1 - e-7r)2 



exp 



7 



1 - e-T^ 



(A.17) 



(A. 18) 



The flow at ^ — > +0 is non-zero. The total fraction of repaired cells can 
be calculated by calculating the integral of flow. Rewriting equation in 
dimensionless coordinates, we see that 



fr{r) = p{u, 0) du = exp (^-j- 

which gives equation (fT3il . 

The momenta of function p(^) are defined as 
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g-7r 



(A.19) 



Laplace transform gives 



n = 0,1,2, 



(A.20) 



(A.21) 



Diffe rentiating the function V, calculating central momenta (see (lAbramowitz and Stegunl . 



19721 . § 26)) and returning to dimensional coordinates, we obtain equations (^E 



and f[T7j) for mean, deviation, skewness and kurtosis. After transformation of 
equation flA.16p to dimensional coordinates and substitution of equations f|T6|) . 
we obtain equation ([T^ . 
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